Adaptive Hierarchical Motion-Focused Model for Video Prediction

Tang, Min; Wang, Wenmin; Chen, Xiongtao; He, Yifeng

doi:10.1007/978-3-030-00776-8_53

Min Tang¹⁸,
Wenmin Wang¹⁸,
Xiongtao Chen¹⁸ &
…
Yifeng He¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11164))

Included in the following conference series:

Pacific Rim Conference on Multimedia

3667 Accesses

Abstract

Video prediction is a promising task in computer vision for many real-world applications and worth exploring. Most existing methods generate new frames based on appearance features with few constrain, which results in blurry predictions. Recently, some motion-focused methods are proposed to alleviate the problem. However, it’s difficult to capture the object motions from a video sequence and apply the learned motions to appearance, due to variety and complexity of real-world motions. In this paper, an adaptive hierarchical motion-focused model is introduced to predict realistic future frames. This model takes advantage of hierarchical motion modeling and adaptive transformation strategy, which can achieve better motion understanding and applying. We train our model end to end and employ the popular adversarial training to improve the quality of generations. Experiments on two challenging datasets: Penn Action and UCF101, demonstrate that the proposed model is effective and competitive with outstanding approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Mathieu, M., Couprie, C., Lecun, Y.: Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:1511.05440 (2015)
Vondrick, C., Pirsiavash, H., Torralba, A.: Generating videos with scene dynamics. In: Advances in Neural Information Processing Systems (NIPS), Barcelona, pp. 613–621 (2016)
Google Scholar
Villegas, R., Yang, J., Hong, S., et al.: Decomposing motion and content for natural video sequence prediction. arXiv preprint arXiv:1706.08033 (2017)
Lu, C., Hirsch, M., Scholkopf, B.: Flexible spatio-temporal networks for video prediction. In: Conference on Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Finn, C., Goodfellow, I., Levine, S.: Unsupervised learning for physical interaction through video prediction. In: Advances in Neural Information Processing Systems (NIPS), Barcelona (2016)
Google Scholar
Liu, Z., Yeh, R. A., Tang, X., et al.: Video frame synthesis using deep voxel flow. In: International Conference on Computer Vision (ICCV) (2017)
Google Scholar
Chen, X., Wang, W., Wang, J., et al.: Learning object-centric transformation for video prediction. In: Proceedings of the 2017 ACM on Multimedia Conference, pp. 1503–1512 (2017)
Google Scholar
Villegas, R., Yang, J., Zou, Y., et al.: Learning to generate long-term future via hierarchical prediction. arXiv preprint arXiv:1704.05831 (2017)
Jia, X., De Brabandere, B., Tuytelaars, T., et al.: Dynamic filter networks. In: Advances in Neural Information Processing Systems (NIPS), Barcelon (2016)
Google Scholar
Van Amersfoort, J., Kannan, A., Ranzato, M.A., et al.: Transformation-based models of video sequences. arXiv preprint arXiv:1701.08435 (2017)
Vondrick, C., Torralba, A.: Generating the future with adversarial transformers. In: Conference on Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Xue, T., Wu, J., Bouman, K., et al.: Visual dynamics: probabilistic future frame synthesis via cross convolutional networks. In: Advances in Neural Information Processing Systems (NIPS), Barcelona, pp. 91–99 (2016)
Google Scholar
Song, Y., Viventi, J., Wang, Y.: Multi resolution LSTM for long term prediction in neural activity video. arXiv preprint arXiv:1705.02893 (2017)
Dai, J., Qi, H., Xiong, Y., et al.: Deformable convolutional networks. In: Conference on Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Goodfellow, I., Pouget-Abadie, J., Mirza, M., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems (NIPS), pp. 2672–2680 (2014)
Google Scholar
Zhang, W., Zhu, M., Derpanis, K.G.: From actemes to action: a strongly-supervised representation for detailed action understanding. In: International Conference on Computer Vision (ICCV) (2013)
Google Scholar
Soomro, K., Zamir, A. R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)
Wang, Z., Bovik, A.C., Sheikh, H.R., et al.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Article Google Scholar
Liang, X., Lee, L., Dai, W., et al.: Dual motion GAN for future-flow embedded video prediction. In: International Conference on Computer Vision (ICCV) (2017)
Google Scholar
Byeon, W., Wang, Q., Srivastava, R.K., et al.: Fully context-aware video prediction. arXiv preprint arXiv:1710.08518 (2017)

Download references

Acknowledgement

This work is supported by Shenzhen Peacock Plan (20130408-183003656), Shenzhen Key Laboratory for Intelligent Multimedia and Virtual Reality (ZDSYS201703031405467), and National Natural Science Foundation of China (NSFC, No.U1613209).

Author information

Authors and Affiliations

School of Electronic and Computer Engineering, Shenzhen Graduate School, Peking University, Shenzhen, China
Min Tang, Wenmin Wang, Xiongtao Chen & Yifeng He

Authors

Min Tang
View author publications
You can also search for this author in PubMed Google Scholar
Wenmin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiongtao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yifeng He
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenmin Wang .

Editor information

Editors and Affiliations

Hefei University of Technology, Hefei, China
Richang Hong
National Chiao Tung University, Hsinchu, Taiwan
Wen-Huang Cheng
University of Tokyo, Tokyo, Japan
Toshihiko Yamasaki
Hefei University of Technology, Hefei, China
Meng Wang
City University of Hong Kong, Hong Kong, Hong Kong
Chong-Wah Ngo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tang, M., Wang, W., Chen, X., He, Y. (2018). Adaptive Hierarchical Motion-Focused Model for Video Prediction. In: Hong, R., Cheng, WH., Yamasaki, T., Wang, M., Ngo, CW. (eds) Advances in Multimedia Information Processing – PCM 2018. PCM 2018. Lecture Notes in Computer Science(), vol 11164. Springer, Cham. https://doi.org/10.1007/978-3-030-00776-8_53

Download citation

DOI: https://doi.org/10.1007/978-3-030-00776-8_53
Published: 19 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00775-1
Online ISBN: 978-3-030-00776-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics